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Abstract 

Given the possibility of communication systems failing catastrophically, we investigate limits to communicating 
over channels that fail at random times. These channels are finite-state semi-Markov channels. We show that 
communication with arbitrarily small probability of error is not possible. Making use of results in finite blocklength 
channel coding, we determine sequences of blocklengths that optimize transmission volume communicated at fixed 
maximum message error probabilities. We provide a partial ordering of communication channels. A dynamic 
programming formulation is used to show the structural result that channel state feedback does not improve 
f--*) [ performance. 

(N 

"a communication channel. . . might be inoperative because of an amplifier failure, a broken or cut telephone wire, ..." 

— /. M. Jacobs fi2$ 

I. Introduction 

Physical systems have a tendency to fail at random times |3|. This is true whether considering communication 
HH ■ systems embedded in sensor networks that may run out of energy (4], synthetic communication systems embedded 
in biological cells that may die 00 communication systems embedded in spacecraft that may enter black holes 
' ^ < \ |[9|| , or communication systems embedded in oceans with undersea cables that may be cut iflOll . In these scenarios 
and beyond, failure of the communication system may be modeled as communication channel death. 

As such, it is of interest to study information-theoretic limits on communicating over channels that die at 
\ random times. This paper gives results on the fundamental limits of what is possible and what is impossible 

• when communicating over channels that die. Communication with arbitrarily small probability of error {Shannon 
^ , reliability) is not possible for any positive communication volume, however a suitably defined notion of ^-reliability 
^ is possible. Schemes that optimize communication volume for a given level of //-reliability are developed herein. 

The central trade-off in communicating over channels that die is in the lengths of codeword blocks. Longer blocks 
t-h , improve communication performance as classically known, whereas shorter blocks have a smaller probability of 
^7"! \ being prematurely terminated due to channel death. In several settings, a simple greedy algorithm for determining the 

• sequence of blocklengths yields a certifiably optimal solution. We also develop a dynamic programming formulation 
^ , to optimize the ordered integer partition that determines the sequence of blocklengths. Besides algorithmic utility, 

H \ solving the dynamic program demonstrates the structural result that channel state feedback does not improve 
- - performance. 

The optimization of codeword blocklengths is reminiscent of frame size control in wireless networks lflT1 - lfl4l . 
however such techniques are used in conjunction with automatic repeat request protocols and are motivated by 
amortizing protocol information. Moreover, the results demonstrate the benefit of adapting to either channel state 
or decision feedback. Contrarily, we show that adaptation to channel state provides no benefit for channels that die. 

Limits on channel coding with finite blocklength ltT51 - lr2Tl are central to our development. Indeed, channels 
that die bring the notion of finite blocklength to the fore and provide a concrete physical reason to step back 
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from infinityll Notions of outage in wireless communication I1221 . 11231 and lost letters in postal channels E4l are 
similar to channel death, except that neither outage nor lost letters are permanent conditions. Therefore blocklength 
asymptotics are useful to study those channel models but are not useful for channels that die. Recent work that has 
similar motivations as this paper provides the outage capacity of a wireless channel [25]. 

The remainder of the paper is organized as follows. Section [TT] defines discrete memoryless channels that die 
and shows that these channels have zero Shannon capacity. Section HTT1 states the communication system model and 
also fixes our novel performance criteria. Section [TV] shows that our notion of Shannon reliability is not achievable, 
strengthening the result of zero Shannon capacity and then provides a communication scheme and determines 
its performance. Section [V] optimizes performance for several death distributions using either a greedy algorithm 
or a dynamic programming algorithm. Optimization demonstrates that channel state feedback does not improve 
performance. Section [VTJ discusses the partial ordering of channels. Section [VTT1 suggests several extensions to this 
work. 



II. Channel Model 



Consider a channel with finite input alphabet X and finite output alphabet y. It has an alive state s = a when 
it acts like a noisy discrete memoryless channel (DMC) and a dead state s = d when it erases the inputJl Assume 
throughout the paper that the DMC from the alive state has zero error capacity 11281 equal to zero0 

For example, if the channel acts like a binary symmetric channel (BSC) with crossover probability < e < 1 in 
the alive state, with X = {0, 1}, and y = {0, 1, ?}, then the transmission matrix in the alive state is 



p(y\x, 



and the transmission matrix in the dead state is 



p a (y\x) 



l-e e 
e l-e 



p(y\x,s = d) =Pd{y\x) 



(1) 



(2) 



The channel starts in state s = a and then transitions to s = d at some random time T, where it remains for all time 
thereafter. That is, the channel is in state a for times n = 1, 2, . . . , T and in state d for times n = T + l,T + 2, .... 
The death time distribution is denoted pr(t)- Note that there is always a finite such that pr(tf) > 0. 



A. Finite-State Semi-Markov Channel 

Channels that die can be classified as finite-state channels (FSCs) [31, Sec. 4.6]. 
Proposition 1: A channel that dies (X , p a (y\x) , pd(y\x) , pr{t) ,y) is a finite-state channel. 

Proof: Follows by definition, since the channel has two states. ■ 
Channels that die have semi-Markovian [32, Sec. 4.8], lf33l Sec. 5.7] properties. 

Definition 1: A semi -Markov process changes state according to a Markov chain but takes a random amount of 
time between changes. More specifically, it is a stochastic process with states from a discrete alphabet S, such that 
whenever it enters state s, s G S: 

• The next state it will enter is state r with probability that depends only on s,r G S. 

• Given that the next state to be entered is state r, the time until the transition from s to r occurs has distribution 
that depends only on s,rG S. 

Definition 2: The Markovian sequence of states of a semi-Markov process is called the embedded Markov chain 
of the semi-Markov process. 

Definition 3: A semi-Markov process is irreducible if its embedded Markov chain is irreducible. 

Proposition 2: A channel that dies (X ,p a (y\x),pd(y\x),pT{t),y) has a channel state sequence that is a non- 
irreducible semi-Markov process. 

2 The phrase "back from infinity" is borrowed from J. Ziv's 1997 Shannon Lecture. 

3 Our results can be extended to cover cases where the channel acts like other channels |26|, |27| in the alive state. 
4 If the channel is noiseless in the alive state, the problem is similar to settings where fountain codes | 29| are used in the point-to-point 
case and growth codes [30] are used in the network case. 
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Proof: When in state a, the next state is d with probability 1 and given that the next state is to be d, the time 
until the transition from a to d has distribution pr{t)- When in state d, the next state is d with probability 1. Thus, 
the channel state sequence is a semi-Markov process. 

The semi-Markov state process is not irreducible because the a state of the embedded Markov chain is transient. 

■ 

Note that when T is a geometric random variable, the channel state process forms a Markov chain, with transient 
state a and recurrent, absorbing state d. 
There are further special classes of FSCs. 

Definition 4: An FSC is & finite-state semi-Markov channel (FSSMC) if its state sequence forms a semi-Markov 
process. 

Definition 5: An FSC is a finite-state Markov channel (FSMC) if its state sequence forms a Markov chain. 
Proposition 3: A channel that dies (X ,p a (y\x),pd(y\x),pT{t), y) is an FSSMC and is an FSMC when T is 
geometric. 

Proof: Follows from Props. Q] and [2] ■ 
FSMCs have been widely studied in the literature |[3ll . 0411 . 1351 . particularly the panic button/child's toy channel 
of Gallager (H p. 26], |E] p. 103] and the Gilbert-Elliott channel and its extensions (36), l37t 

Contrarily, FSSMCs seem to not have been specifically studied in information theory. There are a few works 11381 - 
|40l that give semi-Markov channel models for wireless communications systems but do not provide information- 
theoretic characterizations. 

B. Capacity is Zero 

A channel that dies has Shannon capacity equal to zero. To show this, first notice that if the initial state of a 
channel that dies were not fixed, then it would be an indecomposable FSC 1311 Sec. 4.6], where the effect of the 
initial state dies away. 

Proposition 4: If the initial state of a channel that dies (X ,p a (y\x),pd(y\x),pT(t),y) is not fixed, then it is an 
indecomposable FSC. 

Proof: The embedded Markov chain for a channel that dies has a unique absorbing state d. ■ 
Indecomposable FSCs have the property that the upper capacity, defined in ||3T1 (4.6.6)], and lower capacity, 
defined in [31, (4.6.3)], are identical 1311 Thm. 4.6.4]. This can be used to show that the capacity of a channel that 
dies is zero. 

Proposition 5: The Shannon capacity, C, of a channel that dies (X,p a (y\x),pd{y\x),pT(t),y) is zero. 

Proof: Although the initial state is si = a here, temporarily suppose that si may be either a or d. Then the 
channel is indecomposable by Prop. [4] 

The lower capacity C_ equals the upper capacity C, for indecomposable channels by 11311 Thm. 4.6.4]. The 
information rate of a memoryless Pd(y\%) 'dead' channel is clearly zero for any input distribution, so the lower 
capacity C = 0. Thus the Shannon capacity for a channel that dies with initial alive state is C = C = 0. ■ 

III. Communication System 

In order to information theoretically characterize a channel that dies, a communication system that contains the 
channel is described. 

We have an information stream (like i.i.d. equiprobable bits), which can be grouped into a sequence of k messages, 
(Wi, W2, ■ ■ ■ , Wjfc). Each message Wi is drawn from a message set Wi = {1,2,..., Mi}. Each message Wi is en- 
coded into a channel input codeword X™' (Wi) and these codewords (X™ 1 (Wi), X™ 2 (W 2 ), ■ ■ ■ , X™ k (W k )) are trans- 
mitted in sequence over the channel. A noisy version of this codeword sequence is received, Y^ 1+n ' 2 ^ hrik (Wi, W%, ■ ■ ■ , Wk)- 
The receiver then guesses the sequence of messages using an appropriate decoding rule g, to produce (Wi, W2, . . . , Wk) = 
g(Y™ 1+n2A hrifc ). The W,s are drawn from alphabets Wf = Wi U 0, where the Q message indicates the decoder 
declaring an erasure. The receiver makes an error on message i if Wj ^ Wi and Wi 7^ Q. 

Block coding results are typically expressed with the concern of sending one message rather than k messages as 
hereH 

5 Tree codes are beyond the scope of this paper, since we desire to communicate messages. A reformulation of communicating over 
channels that die using tree codes 1411 Ch. 10] with early termination |42| would, however, be interesting. In fact, communicating over 
channels that die using convolutional codes with sequential decoding would be very natural, but would require performance criteria different 
from the ones developed herein. 
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System definitions can be formalized as follows. 

Definition 6: An (Mi,rii) individual message code for a channel that dies {X ,Pa(y\ x ),Pd(y\ x ))Pr(t), y) consists 
of: 

1) An individual message index set {1,2,..., Mi}, and 

2) An individual message encoding function f\ : {1,2,..., M{\ h-> X Ui . 

The individual message index set {l,2,...,Mj} is denoted Wj, and the set of individual message codewords 
{/«(!)) /« (2), ••• , /j(Mj)} is called the individual message codebook. 

Definition 7: An (Mi,rii)f =1 code for a channel that dies (X ,p a (y\x),pd(y\x),pT(t),y) is a sequence of k 
individual message codes, (Mj,nj)^ =1 , in the sense of comprising: 

1) A sequence of individual message index sets Wi, W2, . . . , Wk, 

2) A sequence of individual message encoding functions / = (fa, fa, . . . , /*.), and 

3) A decoding function g : yEJU"* ^ Wf x W 2 e x • • • x Wf 5 . 

There is no essential loss of generality by assuming that the decoding function g is decomposed into a sequence 
of individual message decoding functions g = (51,52; • • • i9n) where g^ : y n * h-> Wf when individual messages 
are chosen independently, due to this independence and the conditional memorylessness of the channel. 

To define performance measures, we assume that the decoder operates on an individual message basis. That is, 
when applying the communication system, let W\ = gi(Y^' 1 ), W% = gi (^+i" 2 )> and so on. 

For the sequel, we make a further assumption on the operation of the decoder. 

Assumption 1: If all n« channel output symbols used by individual message decoder gi are not ?, then the range 
of gi is Wi. If any of the rtj channel output symbols used by individual message decoder gi are ?, then g^ maps to 

e. 

This assumption corresponds to the physical properties of a communication system where the decoder fails catas- 
trophically. Once the decoder fails, it cannot perform any decoding operations, and so the ? symbols in the channel 
model of system failure must be ignored. 

A. Performance Measures 

We formally write the notion of error for the communication system as follows. 
Definition 8: For all 1 < w < Mi, let 

\ w (i) = Pr[# 4 + w\Wi = w,Wi + 0] 

be the conditional message probability of error given that the ith individual message is w. 
Definition 9: The maximal probability of error for an (Mj,rij) individual message code is 

Ama X («) = max X w (i). 

Definition 10: The maximal probability of error for an (Mj,nj)*L 1 code is 

Amax = max A max (z). 
ie{i,...,k} 

Performance criteria weaker than traditional in information theory are defined, since the Shannon capacity of a 
channel that dies is zero (Prop. [5]). In particular, we define formal notions of how much information is transmitted 
using a code and how long it takes. 

Definition 11: The transmission time of an (Mj,nj)| =1 code is N = J2i=i n i- 

Definition 12: The expected transmission volume of an (Mi,rii)^ =1 code is 

V = E T l lo sMA. 

Ue{i,...,fe|Wi#e} J 

Notice that although declared erasures do not lead to errors, they do not contribute transmission volume either. 
The several performance criteria for a code may be combined together. 
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Definition 13: Given < rj < 1, a pair of numbers (No,Vq) (where iVo is a positive integer and Vq is non- 
negative) is said to be an achievable transmission time-volume at rj-reliability if there exists, for some k, an 
(Mi, »it)iL=i code for the channel that dies (X ,p a (y\x),pd{y\x),pT(t),y) such that 



Moreover, (iVo,Vo) is said to be an achievable transmission time-volume at Shannon reliability if it is an 
achievable transmission time-volume at ^-reliability for all < 7] < 1. 

IV. Limits on Communication 

Having defined the notion of achievable transmission time-volume at various levels of reliability, the goal of this 
work is to demarcate what is achievable. 

A. Shannon Reliability is Not Achievable 

Not only is the Shannon capacity of a channel that dies zero, but also there is no V > such that (N, V) 
is an achievable transmission time-volume at Shannon reliability. A coding scheme that always declares erasures 
would achieve zero error probability (and therefore Shannon reliability) but would not provide positive transmission 
volume; this is also not allowed under Assumption [TJ 

Lemmas are stated and proved after the proof of the main proposition. For brevity, the proof is limited to the 
alive-BSC case, but can be extended to general alive-DMCs by choosing the two most distant letters in y for 
constructing the repetition code, among other things. 

Proposition 6: For a channel that dies (X ,p a (y\x),pd(y\x),pT{t),y), there is no V > such that (N, V) is an 
achievable transmission time-volume at Shannon reliability. 

Proof: From the error probability viewpoint, transmitting longer codes is not harder than transmitting shorter 
codes (Lem. Q]) and transmitting smaller codes is not harder than transmitting larger codes (Lem. [2]). Hence, the 
desired result follows from showing that even the longest and smallest code that has positive expected transmission 
volume cannot achieve Shannon reliability. 

Clearly the longest and smallest code uses a single individual message code of length n\ — > oo and size M\ = 2. 
Among such codes, transmitting the binary repetition code is not harder than transmitting any other code (Lem. [3). 
Hence showing that the binary repetition code cannot achieve Shannon reliability yields the desired result. 

Consider transmitting a single (Mi = 2, n\) individual message code that is simply a binary repetition code over 
a channel that dies (X,p a (y\x),p d (y\x),p T (t),y). 

Let Wi = {00000 . . . , 11111 . . .}, where the two codewords are of length n\. Assume that the all-zeros codeword 
and the all-ones codeword are each transmitted with probability 1/2 and measure average probability of error, since 
average error probability lower bounds A max (l) I3T1 Problem 5.32]. The transmission time N = n\ and let N — > oo. 
The expected transmission volume is log 2 > 0. 

Under equiprobable signaling over a BSC, the minimum error probability decoder is the maximum likelihood 
decoder, which in turn is the minimum distance decoder |43l Problem 2.13]. 

The scenario corresponds to binary hypothesis testing over a BSC(e) with T observations (since after the channel 
dies, the output symbols do not help with hypothesis testing). Since there is a finite v such that pr(t ) > 0, there 
is a fixed constant K such that A max > K > for any realization T = t. 

Thus Shannon reliability is not achievable. ■ 

Lemma 1: When transmitting over the alive state's memoryless channel p a (y\x), let the maximal probability of 
error A max (i) for an optimal (Mj, ni) individual message code and minimum probability of error individual decoder 
gi be A max (i;rai). Then A max (z;ni + 1) < A max (i;nj). 

Proof: Consider the optimal block-length-rij individual message code/decoder, which achieves A max (i; rij). Use 
it to construct an m + 1 individual message code that appends a dummy symbol to each codeword and an associated 
decoder that operates by ignoring this last symbol. The error performance of this (suboptimal) code/decoder is clearly 
A m ax(z; ^i)> and so the optimal performance can only be better: A max (i;nj + 1) < A max (i;nj). ■ 



A; 



•max _ 1]> 



(3) 
(4) 
(5) 



N < N , and 
V>V . 
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Lemma 2: When transmitting over the alive state's memoryless channel p a (y\x), let the maximal probability 
of error P max (i) for an optimal (Mi,m) individual message code and minimum probability of error individual 
decoder /| } be P e max (i; M{). Then P e max (i; M{) < P™ x (i; M { + 1). 

Proof: Follows from sphere-packing principles. ■ 

Lemma 3: When transmitting over the alive state's memoryless channel p a (y\x), the optimal (Mj = 2, rii) 
individual message code can be taken as a binary repetition code. 

Proof: Under minimum distance decoding (which yields the minimum error probability j43l Problem 2.13]) 
for a code transmitted over a BSC, increasing the distance between codewords can only reduce error probability. 
The repetition code has maximum Hamming distance between codewords. ■ 

Notice that Prop. [6] also directly implies Prop. [5j providing an alternate proof. 

Corollary 1: The Shannon capacity of a channel that dies (X ,p a (y\x),pd(y\x),pT(t), y) is zero. 



B. Finite Blocklength Channel Coding 

Before developing an optimal scheme for ry-reliable communication over a channel that dies, finite block length 
channel coding is reviewed. 

Under our definitions, traditional channel coding results iPTSll . |[T7l - ll2ll provide information about individual 
message codes, determining the achievable trios (rii, Mi, A max (i)). In particular, the largest possible Mj for a given 
rii and A max (i) is denoted M*(n i5 A max (i)). 

The purpose of this work is not to improve upper and lower bounds on finite block length channel coding, 
but to use existing results to study channels that die. In fact, for the sequel, simply assume that the function 
M*(nj, A max (i)) is known, as are codes/decoders that achieve this value. In principle, optimal individual message 
codes may be found through exhaustive search ifTTl , P4l . Although algebraic notions of code quality do not directly 
imply error probability quality [45], perfect codes such as the Hamming or Golay codes may also be optimal in 
certain limited cases. 

Recent results comparing upper and lower bounds around Strassen's normal approximation to log M*(rii, A max (i)) 
I461 have demonstrated that the approximation is quite good fl9l . 

Remark 1: We assume that optimal M*(rii, ^-achieving individual message codes are known. Exact upper and 
lower bounds to log M* (ti^ , 77) can be substituted to make our results precise. For numerical demonstrations, we 
will further assume that optimal codes have performance given by Strassen's approximation. 

The following expression for log M* (7^, 77) that first appeared in ll46l is also given as [19, Thm. 6]. 

Lemma 4: Let M*(ra.;, 77) be the largest size of an individual message code with block length rii and maximal 
error probability upper bounded by A max (i) < 77. Then, for any DMC with capacity C and < 77 < 1/2, 

log M*(m, 77) = riiC - y/rnpQ' 1 ^) + O(lognj), 

where 

1 r°° 

Q(x) = -^ e- t2 ' 2 dt, 
V^Jx 



mm var 

X:C=I(X;Y) 



\ Pr\x{y\x) 

log 1-— 

PY{y) 



and standard asymptotic notation fl47l is used. 

For the BSC(e), the approximation (ignoring the O(logrej) term above) is: 



logM* « m(l - h 2 (e)) - ^/n i e{l-e)Q-\i 1 ) log 2 ^, (6) 



where /i2(-) is the binary entropy function. This BSC expression first appeared in [48]. 

For intuition, we plot the approximate log M* (rii , 77) function for a BSC(e) in Fig. 1(a) Notice that logM* is 
zero for small rii since no code can achieve the target error probability rj. Also notice that log M* is a monotonically 
increasing function of rii. Moreover, notice in Fig. |l(b)| that even when normalized, (log M*)/rii, is a monotonically 
increasing function of rii. Therefore longer blocks provide more 'bang for the buck.' The curve in Fig. |l(b)| 
asymptotically approaches capacity. 
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Fig. 1. |(a)| The expression l[6} for e = 0.01 and r\ 
The capacity of a BSC(e) is 1 - h 2 (e) = 0.92. 



0.001.1(b)] Normalized version, (log M* (m, r]))/rii, for e = 0.01 and 77 = 0.001. 



C. rj-reliable Communication 

We now describe a coding scheme that achieves positive expected transmission volume at ^-reliability. Survival 
probability of the channel plays a key role in measuring performance. 

Definition 14: The survival function of a channel that dies (X ,p a (y\x),pd(y\x),pT{t), y) is Pr[T > t], is denoted 
Rr(t), and satisfies 

t 

R T (t) = Pr[T > i] = 1 - J>t(t) = 1 - F T (t), 

T=l 

where Ft is the cumulative distribution function. 
Rr(t) is a non-increasing function. 

Proposition 7: The transmission time-volume 

(k k \ 

2V = ^ n 4 , 1/ = ^ R T (ei) log M*(n 4 , 77) 
i=l i=l J 

is achievable at ^-reliability for any sequence (rii)^ =l of individual message codeword lengths, where eo = 0, e\ = 

n 1 ,e2 = n 1 +n 2 ,...,e k = Yli=i n i- 
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Proof: 

Code Design: A target error probability r\ and a sequence (rii)f =1 of individual message codeword lengths 
are fixed. Construct a length-A; sequence of (Mj,n,) individual message codes and individual decoding functions 
(Wi, fi,9i) that achieve optimal performance. The size of Wi is \W{\ = log M*(rii, rf). Note that individual decoding 
functions gi have range Wi rather than W®. 

Encoding: A codeword W\ = w\ is selected uniformly at random from the codebook Wi. The mapping of this 
codeword into raj channel input letters, X^ +l = f\(w\), is transmitted in channel usage times n = eo + l,eo + 
2,...,ei. 

Then a codeword W2 = ^2 is selected uniformly at random from the codebook W2. The mapping of this codeword 
into ri2 channel input letters, X^' +1 = fi{w2), is transmitted in channel usage times n = e\ + 1, e\ + 2, . . . , e2. 

This procedure continues until the last individual message code in the code is transmitted. That is, a codeword 
Wk = Wk is selected uniformly at random from the codebook Wfe. The mapping of this codeword into channel 
input letters, X^ h _ i+1 = fk{wk), is transmitted in channel usage times n = + 1, e^-x + 2, . . . , e^. 

We refer to channel usage times n G \ci-\ + 1, ej_i + 2, . . . , e»} as the zth transmission epoch. 

Decoding: For decoding, the channel output symbols for each epoch are processed separately. If any of the 
channel output symbols in an epoch are erasure symbols ?, then a decoding erasure is declared for the message 
in that epoch, i.e. Wi = 0. Otherwise, the individual message decoding function gi : y ni — > Wi is applied to 
obtain W = gi{Y e e >_ i+1 ). 

Performance Analysis: Having defined the communication scheme, we measure the error probability, transmission 
time, and expected transmission volume. 

The decoder will either produce an erasure or use an individual message decoder gi. When gi is used, the 
maximal error probability of individual message code error is bounded as A max (i) < r\ by construction. Since 
declared erasures do not lead to error, and since all A max («) < r\, it follows that 

Amax ^ V- 

The transmission time is simply N = ^ m. 

Recall the definition of expected transmission volume: 

eJ Y, l °Z M i\= E E {log Mi} 

and the fact that the channel produces the erasure symbol ? for all channel usage times after death, n > T, but not 
before. Combining this with the length of an optimal code, log M*(rii, 77), leads to the expression 

k 

^Pr[T>e*]logM*K,77), 

-i=i 

since all individual message codewords that are received in their entirety before the channel dies are decoded using 
gi whereas any individual message codewords that are even partially cut off are declared ©. 

Recalling the definition of the survival function, the expected transmission volume of the communication scheme 

is 

k 

Y R T(fii) log M*(m,ri) 
i=i 

as desired. ■ 
Prop. |7] is valid for any choice of (nj)f =1 . Since (log M*)/n, is monotonically increasing, it is better to use 
individual message codes that are as long as possible. With longer individual message codes, however, there is a 
greater chance of many channel usages being wasted if the channel dies in the middle of transmission. The basic 
trade-off is captured in picking the set of values {m, rt2, . . . , n&}. For fixed and finite N, this involves picking an 
ordered integer partition n\ + ni + • • • + nu = N. We optimize this choice in Section |Vl 
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D. Converse Arguments 

Since we simply have operational expressions and no informational expressions in our development, as per 
Remark [1] and since optimal individual message codes and individual message decoders are assumed to be used, 
it may seem as though converse arguments are not required. This would indeed follow, if the following two things 
were true, which follow from Assumption [T] First, that there is no benefit in trying to decode the last partially 
erased message block. Second, that there is no benefit to errors-and-erasures decoding |49l by the gi for codewords 
that are received before channel death. Under Assumption [T] Prop. [7] gives the best performance possible. 

One might wonder whether Assumption Q] is needed. That there would be no benefit in trying to decode the last 
partially erased block follows from the conjecture that an optimal individual message code would have no latent 
redundancy that could be exploited to achieve a A max (« = last) < 77, but this is a property of the actual optimal 
code. 

Understanding the possibility of errors-and-erasures decoding [49] by the individual message decoders also 
requires knowing properties of actual optimal codes. It is unclear how the choice of threshold in errors-and-erasures 
decoding would affect the expected transmission volume 

k 

5^(1 -ti)R T {ei) log M*(m, £i,n), 

i=l 

where £j would be the specified erasure probability for individual message i, and M*(rij, £j, 77) would be the 
maximum individual message codebook size under erasure probability and maximum error probability 77. 

What we can say, however, is that at the level of Strassen's approximation (up to the log n term), log M*(nj, £j, 77) 
and log M*(rii,r)) are the same lT50l Thm. 47]. 



V. Optimizing the Communication Scheme 

In Section |IV-C[ we had not optimized the lengths of the individual message codes; we do so here. For fixed 
77 and N, we maximize the expected transmission volume V over the choice of the ordered integer partition 

n\ + n 2 H h n k = N: 

k 

max V# T (e;) log M*(n;, 77). (7) 

For finite N, this optimization can be carried out by an exhaustive search over all 2 N ~ 1 ordered integer partitions. 
If the death distribution pr{t) has finite support, there is no loss of generality in considering only finite N. Since 
exhaustive search has exponential complexity, however, there is value in trying to use a simplified algorithm. A 
dynamic programming formulation for the finite horizon case is developed in Section IV-Ci The next subsection 
develops a greedy algorithm which is applicable to both the finite and infinite horizon cases and yields the optimal 
solution for certain problems. 



A. A Greedy Algorithm 

To try to solve the optimization problem (0, we propose a greedy algorithm that optimizes blocklengths n, one 
by one. 

Algorithm 1: 

1) Maximize i?y(ni) log M*(m, rj) through the choice of n\ independently of any other m. 

2) Maximize Rt{&2) log M*(n2, rj) after fixing e\ = m, but independently of later rtj. 

3) Maximize Rt(cs) log M*(n%, 77) after fixing e2, but independently of later m. 

4) Continue in the same manner for all subsequent m. 

Sometimes the algorithm produces the correct solution. 

Proposition 8: The solution produced by the greedy algorithm, (rij), is locally optimal if 

ifr(ei) log M*(m, 77) -R T {ej- 1) log M*(m -l, v ) > 
i? T (ei+i)[logM*(n i+ i + 1,77)- log M*(n i+1 , 77)] " 

for each i. 
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Proof: The solution of the greedy algorithm partitions time using a set of epoch boundaries (ej). The proof 
proceeds by testing whether local perturbation of an arbitrary epoch boundary can improve performance. There are 
two possible perturbations: a shift to the left or a shift to the right. 

First consider shifting an arbitrary epoch boundary ej to the right by one. This makes the left epoch longer and 
the right epoch shorter. Lengthening the left epoch does not improve performance due to the greedy optimization of 
the algorithm. Shortening the right epoch does not improve performance since Rr(ei) remains unchanged whereas 
log M*(rij, 77) does not increase since logM* is a non-decreasing function of n%. 

Now consider shifting an arbitrary epoch boundary ej to the left by one. This makes the left epoch shorter and 
the right epoch longer. Reducing the left epoch will not improve performance due to greediness, but enlarging the 
right epoch might improve performance, so the gain and loss must be balanced. 

The loss in performance (a positive quantity) for the left epoch is 

A l = R T (a ) log M* (m , 77) - R T - 1) log M*(rii — 1, 77) 
whereas the gain in performance (a positive quantity) for the right epoch is 

A r = R T (e i+ i) [log M*(n m + 1,7/) - log M*(n m , 7/)] . 
If A; > A r , then perturbation will not improve performance. The condition may be rearranged as 

ifrr(et)logM*(ra f ,T7) - R T {ej - 1) log M*{m - l,rj) > 
i? T (ei + i)[logM*(n i+1 + l,77)-logM*(n i+ i,7 ? )] ~ 

This is the condition ((8]), so the left-perturbation does not improve performance. Hence, the solution produced by 
the greedy algorithm is locally optimal. ■ 
Proposition 9: The solution produced by the greedy algorithm, (rij), is globally optimal if 

R T (ej) log M*(n h r/) - R T { ei - Kj)log M*{m - Kj,rj) > ^ 

for each i, and any non-negative integers Ki < n^. 

Proof: The result follows by repeating the argument for local optimality in Prop. [8] for shifts of any admissible 
size Ki. ■ 

There is an easily checked special case of global optimality condition (|9]) under the Strassen approximation, 
given in the forthcoming Prop. [TOl 

Lemma 5: The function log Mg(z, if) — log Mg(z — K,rf) is a non-decreasing function of z for any K, where 

log M* s (z, V ) = zC- y/zpQ- 1 ^) (10) 

is Strassen's approximation. 

Proof: Essentially follows from the fact that *J~z is a concave n function in z. More specifically yfz satisfies 



-yfz + yjz- K < -y/z + 1 + VZ + 1-K 

for K < z. This implies: 

-V^VpQ~\v) + V^Ky/pQ- 1 ^) < -V^Tly/pQ-Hv) + Vz + l-K^pQ-\ri). 

Adding the positive constant KC to both sides, in the form zC — zC + KC on the left and in the form [z + 1)C — 
(z + 1)C + KC on the right yields 

zC - y/z~ P Q- l {r]) -(z- K)C + \fz~^Ky/pQ~' 1 {r]) 

< (z + 1)C - y/zTl^Q- 1 ^) - (z + 1 - K)C + Vz + 1- K^Q-\r]) 



and so 



[log M* s (z, 77) - log M* s (z - K, V )] < [log M* s (z + 1, 77) - log M* s (z + 1 — K,rj)] . 



Proposition 10: If the solution produced by the greedy algorithm using Strassen's approximation (fTOl) satisfies 
n\ > 712 > • • • > nfc, then condition (O for global optimality is satisfied. 
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Proof: Since Rt{-) is a non-increasing survival function, 

R T {ei - K) > R T (e i+1 ) (11) 

for the non-negative integer K. Since the function [logMg(z, rj) — log Mg(z — K,rj)] is a non-decreasing function 
of z by Lem. [5J and since the are in non-increasing order, 

log Msinurj) - log M s (m - K,rj) > log M s (n i+1 +K, V ) - log M* s {n i+1} rf). (12) 

Taking products of (fTTT) and (PT2l) and rearranging yields the condition: 

Rrjej - K) [log M* s (n u rj) - log M* s (m - K, V )] > ^ 
R T (e i+ i) [logM*(n i+l + K,rf) - log M* s (n i+1 ,rfj\ ~ 

Since Rt(-) is a non-increasing survival function, 

R T (ei - K) > R T ( ei ) > R T {e i+1 ). 

Therefore the global optimality condition (© is also satisfied, by substituting i?r(ej) for Rx{ei — K) in one place. 

■ 

B. Geometric Death Distribution 

A common failure mode for systems that do not age is a geometric death time T O: 

PT {t) = a{l-a)^ 1 \ 

and 

R T {t) = {l-a)\ 

where a is the death time parameter. 

Proposition 11: When T is geometric, then the solution to <[Vj) under Strassen's approximation yields equal epoch 
sizes. This optimal size is given by 

arg max Rt(v) log M*(v, rf). 

Proof: Begin by showing that Algorithm Q] will produce a solution with equal epoch sizes. Recall that the 
survival function of a geometric random variable with parameter < a < 1 is Rr{t) = (1 — a)*. Therefore the 
first step of the algorithm will choose n\ as 

n\ = arg max(l — a) u log M*(v, rf). 

The second step of the algorithm will choose 

ri2 = argmax(l — a) ni (1 — a) v log M*(u, rf) 
= argmax(l — a) u log M*(v, rf), 

which is the same as ri\. In general, 

rii = argmax(l — a) 6i_1 (1 — a) v log M*{u, rf) 
= argmax(l — a) v log M*(v, rf), 

V 

SO Tl\ = ri2 = ■ ■ ■ . 

Such a solution satisfies n\ > > • • • and so it is optimal by Prop. [TOj ■ 
The optimal epoch size for geometric death under Strassen's approximation can be found analytically, |5T1 
Sec. 6.4.2]. Consider the setting when the alive state corresponds to a BSC(e). For fixed crossover probability e 
and target error probability rj, the optimal epoch size is plotted as a function of a in Fig. [2] The less likely the 
channel is to die early, the longer the optimal epoch length. 
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Fig. 2. Optimal epoch lengths under Strassen's approximation for an (s,a) BSC-geometric channel that dies for e = 0.01 and r\ = 0.001. 




Fig. 3. Achievable 77-reliability in sending 5 bits over (e, a) BSC-geometric channel that dies. 



Alternatively, rather than fixing 77, one might fix the number of bits to be communicated and find the best level 
of reliability that is possible. Fig. [3] shows the best A max = r\ that is possible when communicating 5 bits over a 
BSC(e)-geometric(a) channel that dies. 

Notice that the geometric death time distribution forms a boundary case for Prop. [T0l One can consider discrete 
Weibull death time distributions ll52l to see what happens with heavier tails: 

PT (t) = (I - af-V' 3 -(1-af , 

and 

R T (t) = (I -a) 1 ", 

where (3 is the shape parameter. When j3 > 1, the tail is lighter than geometric and when /3 < 1, the tail is heavier 
than geometric. 

With heavy-tailed death distributions, the greedy algorithm gives epoch sizes that are non-increasing: rii > 712 > 
• • • , and therefore optimal; it is better to send long blocks first and then send shorter ones. 

C. Dynamic Programming 

The greedy algorithm of the previous section solves © under certain conditions. For finite N, a dynamic program 
(DP) may be used to solve © under any conditions. To develop the DP formulation J53), we assume that channel 
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state feedback (whether the channel output is ? or whether it is some other symbol) is available to the transmitter, 
however solving the DP will show that channel state feedback is not required. 

System Dynamics: 

(Cn-l + l)«n-l 
Wn-l^n-l 



"Cn" 









(13) 



for n = 1, 2, . . . , N + 1. The following state variables, disturbances, and controls are used: 

• C n G Z* is a state variable that counts the location in the current transmission epoch, 

• bJ n G {0, 1} is a state variable that indicates whether the channel is alive (1) or dead (0), 

• K n S {0,1} ~ Bern(i?<r(ro)) is a disturbance that kills (0) or revives (1) the channel in the next time step, 
and 

• s n 6 {0, 1} is a control input that starts (0) or continues (1) a transmission epoch in the next time step. 
Initial State: Since the channel starts alive (note that -Rt(I) = 1) and since the first transmission epoch starts at 

the beginning of time, 

n 

(14) 



"Ci" 




"0" 






l 



Additive Cost: Transmission volume logM*(£ n + 1,?/) is credited if the channel is alive (i.e. u n = 1) and the 
transmission epoch is to be restarted in the next time step (i.e. 1 — s n = 1). This implies a cost function 



This is negative so that smaller is better. 

Terminal Cost: There is no terminal cost: cn+i 
Cost-to-go: From time n to time N + 1 is: 



^l-s n KlogM*(Cn + M). 



(15) 



0. 



A' 



E K - 



' N \ 

log M* (0 + 1,7,) I 

. i=n ) 



Notice that the state variable ( n which counts epoch time is known to the transmitter and is determinable by 
the receiver through transmitter simulation. The state variable u n indicates the channel state and is known to the 
receiver by observing the channel output. It may be communicated to the transmitter through the channel state 
feedback. The following result follows directly. 

Proposition 12: A communication scheme that follows the dynamics (fT3l and additive cost (031 ) achieves the 
transmission time-volume 



N,V 



E 



N 



n=l . 



at Ty-reliability. 

DP may be used to find the optimal control policy (s n ). 

Proposition 13: The optimal — V for the initial state (fT4l . dynamics ([TBI , additive cost ( TT5T ), and no terminal 
cost is equal to the cost of the solution produced by the dynamic programming algorithm. 

Proof: The system described by initial state (fT4l ). dynamics ( fT3l ), and additive cost (fT5T ) is in the form of the 
basic problem of dynamic programming ll53l Sec. 1.2]. Thus the result follows from [53 , Prop. 1.3.1] ■ 

The DP optimization computations are now carried out; standard J notation is used for cost ll53l . The base case 
at time N + 1 is 

Jn+i{Cn+i, un+i) = Cn+1 = 0. 
In proceeding backwards from time N to time 1: 



for n = 1, 2, 



, N, where 



min E Kn {c n (C n ,UJ n ,S n ) + J n+ i (/n(Cn,W n ,S n ,K n ))}, 
£„£{0,1} 



fn(Cn,U n , S n , K n ) — [Cn+1 W n +l] 

= [(Cn + l)s n LO n K n ] 
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Substituting our additive cost function yields: 

Jn((n,Un)= min - E Kn {(1 - s n )co n log M* (Cn + 1, rj)} + B Kn { J n+ i} (16) 
s„e{o,i} 

= min -(l-s n )R T (n)logM*(( n + l,ri)+E Kn {J n+1 }. 

s„e{o,l} 

Notice that the state variable u n dropped out of the first term when we took the expectation with respect to the 
disturbance K n . This is true for each stage in the DP. 

Proposition 14: For a channel that dies (X ,p a (y\x),pd{y\x),pT(t),y), channel state feedback does not improve 
performance. 

Proof: By repeating the expectation calculation in (fT6l ) for each stage n in the stage-by-stage DP algorithm, 
it is verified that state variable u does not enter into the stage optimization problem. Hence the transmitter does 
not require channel state feedback to determine the optimal signaling strategy. ■ 

D. A Dynamic Programming Example 

To provide some intuition on the choice of epoch lengths, we present a short example. Consider the channel that 
dies with X = {0, 1}, y = {0, 1, ?}, p a {y\x) given by (Q} with e = 0.01, pd{y\x) given by ([2]), and pr(t) that is 
uniform over a finite horizon of length 40 (disallowing death in the first time step): 

, t)= fl/39, i = 2,...,40, 
) otherwise. 

Our goal is to communicate with ^-reliability, rj = 0.001. 

Since the death distribution has finite support, there is no benefit to transmitting after death is guaranteed. Suppose 
some sequence of r^s is chosen arbitrarily: (m = 13, n-i = 13, = 13, 714 = 1). This has expected transmission 
volume (under the Strassen approximation) 

4 

V = ^R T (e i )logM*(n i ,r ] ) 

3 

( = } log AT (13, 0.001) R T( e i) 

= log M* (13, 0.001) [Br (13) + R T (26) + R T (39)] 
= 4.600[9/13 + 14/39 + 1/39] = 4.954 bits. 

where (a) removes the fourth epoch since uncoded transmission cannot achieve r/-reliability. 

If we run the DP algorithm to optimize the ordered integer partition, we get the result (m = 20, 17,2 = 12, = 
6, ri4 = 2)@ Notice that since the solution is in order, the greedy algorithm would also have succeeded. The expected 
transmission volume for this strategy (under the Strassen approximation) is 

V = R T (20) log M*(20, 0.001) + R T (32) log AT (12, 0.001) + R T (38) log M*(6, 0.001) 
= (20/39) • 9.2683 + (8/39) • 3.9694 + (2/39) • 0.5223 
= 5.594 bits. 

E. A Precise Solution 

It has been assumed that optimal finite block length codes are known and used. Moreover, the Strassen approx- 
imation has been used for certain computations. It is, however, also of interest to determine precisely which code 
should be used over a channel that dies. This subsection gives an example where a sequence of length-23 binary 
Golay codes |[54l are optimal. Similar examples may be developed for other perfect codes; a perfect code is one 
for which there are equal-radius spheres centered at the codewords that are disjoint and that completely fill X n \ 

6 Equivalently (ni = 20,ri2 = 12, nz = 6, Ua = = 1), since the last two channel usages are wasted (see Fig. |l(a)| to hedge against 
channel death. 
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Before presenting the example, the sphere -packing upper bound on log M*(rii, rf) for a BSC(e) is derived. Recall 
the notion of decoding radius 1551 and let p(e, rf) be the largest integer such that 



Y,( ni )e s {l-eT-° <1- V . 



The sphere-packing bound follows from counting how many decoding regions of radius p could conceivably fit 
in the Hamming space 2 n ' disjointly. Let D S:Tn be the number of channel output sequences that are decoded into 
message w m and have distance s from the mth codeword. By the nature of Hamming space, 

n, 
s 

and due to the volume constraint, 

M p 
m=l s=0 

Hence, the maximal codebook size M*(nj,r/) is upper-bounded as 

M*{rii,r))< 



< 



2 n * 



" ( n ;) 

Thus the sphere -packing upper bound on log M*(rii, rf) is 

log M*(ni,r)) <rii- log 



£ 

s=0 



log M sp (rii, rf). 



Perfect codes such as the binary Golay code of length 23 can sometimes achieve the sphere -packing bound with 
equality. 

Consider an (e, a) BSC-geometric channel that dies, with e = 0.01 and a = 0.05. The target error probability 
is fixed at rj = 2.9 x 10 -6 . For these values of e and rj, the decoding radius p(£,r]) = 1 for 2 < rii < 3. It is 
p(e, rf) = 2 for 4 < m < 10; p(e, rj) = 3 for 11 < rij < 23; p(e, r]) = 4 for 24 < m < 40; and so on. 

Moreover, one can note that the (n = 23, M = 4096) binary Golay code has a decoding radius of 3; thus it 
meets the BSC sphere -packing bound 

2 23 

M sp (23,2.9 x 10~ 6 ) = = 4096 

py ' ' 1 + 23 + 253 + 1771 

with equality. 

Now to bring channel death into the picture. If one proceeds greedily, following Algorithm [H but using the 
sphere-packing bound log M sp (rii, rf) rather than the optimal log M*(rij, r]), 

ni(e = 0.01, a = 0.05,7/ = 2.9 x 10~ 6 ) 

2 U 

= are max a u log 9 -. — r = 23. 

By the memorylessness argument of Prop. \TT\ it follows that running Algorithm Q] with the sphere-packing bound 
will yield 23 = n\ = = ■ ■ ■ . 

It remains to show that Algorithm Q] actually gives the true solution. Had Strassen's approximation been used 
rather than the sphere-packing bound, the result would follow directly from Prop. [TT] Instead, the global optimality 
condition (© can be verified exhaustively for all 23 possible shift sizes K for the first epoch: 



a 23 log M sp (23, rf) - a 23 ~ K log M sp (23 - K, V ) 
a 46 log M sp {23 + K)- log M sp (23, rf) 



> 1. 
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Then the same exhaustive verification is performed for all 23 possible shifts for the second epoch: 

a 46 log M sp (23, rj) - a m ~ K log M sp (23 - K, rj) 
a69 bg M sp (23 + K) - a™ log M sp (23, rj) ~ 
a 23 [a 23 log M sp (23, rj) - a 23 ~ K log M, p (23 - K, ry)] 
a 23 [a 46 log M sp (23 + K) - a 46 log M sp (23, rj)] 
a 23 log M ap (23, rj) - a 23 ~ K log M, p (23 - K, rj) 
a 46 log M sp {23 + K)- a 46 log M sp (23,r ? ) " " 

The exhaustive verification can be carried out indefinitely to show that using the length-23 binary Golay code for 
every epoch is optimal. 

F. Practical Codes and Empirical Death Distributions 

It should be noted that the algorithms developed for optimizing communication schemes over channels that die 
work with arbitrary death distributions, even empirically measured ones, e.g. the experimentally characterized death 
properties of a synthetic biology communication system [|5] Fig. 3: Reliability]. 

Further, rather than considering the log M*{rii,rj) function for optimal finite block length codes, the code 
optimization procedures would work just as well if a collection of finite block length codes was provided. Such 
a limited set of codes might be selected for decoding complexity or other practical reasons. As an example, 
consider the collection C of 9191 binary minimum distance codes of lengths between 6 and 16 given in |44l DVD 
supplement]. We run the optimization over the example in Sec. IV-Dl but restricting to C. 

The result obtained for epoch sizes is {n\ = 15,77-2 = 15, 723 = 9, 724 = 1). Under the Strassen approximation, 
this set of epoch sizes gives 5.344 bits, as compared to 5.594 bits under the optimal epoch sizes under the Strassen 
approximation. However the Strassen approximation is not correct and the actual number of bits achieved with the 
optimized epoch sizes for C is 7.246 bits. The two minimum distance codes used are the (n = 15, M = 256, d = 5) 
code and the (n = 9, M = 6, d = 3) code. It remains to be seen whether the restriction to the collection of minimum 
distance codes is actually suboptimal. 

VI. Partial Ordering of Channels 

It is of interest to order channels that die by quality. The partial ordering of DMCs was studied by Shannon [56], 
and as a first step, we can slightly extend his result to order channels that die having common death distributions. 

Definition 15: Let p(i,j) be the transition probabilities for a DMC C\ and let q(k, I) be the transition probabilities 
for a DMC Ci- Then C\ is said to include C2, C\ 5 C2, if there exist two sets of valid transition probabilities 
r 1 {k,i) and tj(j,l), and there exists a vector g: g 7 > and X^fi^ = 1» sucn that 

^2g y r y (k,i)p(i,j)ty(j,l) = q(k,l). 

Proposition 15: Consider two channels that die with identical death distributions: (Xi,Pa,Pd,Pr(i),yi) and 
q a , q<iiPT(t),y<2). Let DMC C\ correspond to p a and let DMC C2 correspond to q a and moreover suppose that 

C\ 5 C*2- Fix a transmission time N and an expected transmission volume V. Let r\\ be the best level of reliability 

for the first channel and 772 be the best level of reliability for the second channel, under (N, V). Then 771 < 772- 
Proof: The main theorem of ll56l proves that the average error probability when transmitting an individual 

message code over C% is less than or equal to the average error probability when transmitting the same individual 

message code over C2. 

Shannon's proof [56 ] holds mutatis mutandis for maximum error probability, replacing "average error probability" 
by "maximum error probability." 

The desired result follows by concatenating individual message codes into a code. ■ 
We can also order channels that die having common alive state transition probabilities. 

Definition 16: Consider two random variables T and U with survival functions Rt(-) and Rjj(-) respectively. 
Then U is said to stochastically dominate T, U > st T, if Rx{t) < Rjj{t) for all t. 

Proposition 16: Consider two channels that die with identical state properties: (X, p a (y\x),pd{y\x),pT,y) and 
(X,p a (y\x),pd(y\x),qu,y). Let death random variable T correspond to pt and let death random variable U 
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correspond to qu and moreover suppose that U > s t T. Fix a transmission time N and a level of reliability rj. Let 
V\ be the best expected transmission volume for the first channel and V2 be the best expected transmission volume 
for the second channel, under (N,rj). Then V2 > V\. 

Proof: Recall the expected transmission volume expression d7) for the first channel: 

max R T (ei) log M*(m,r)) 

(n»):y>i=iV 

% 

and for the second channel: 

max } Ru(h) log M*(ui,rf). 

1 

Since Rrit) < Ru(t) for all i, the result follows directly. ■ 
These two results give individual ordering principles in the two dimensions essentially depicted in Fig. [3] Putting 
them together provides a partial order on all channels that die: if one channel is better than another channel in both 
dimensions, than it is better overall. 

Proposition 17: Consider two channels that die: (Xi,p a ,pd, pr,yi) and (X2,q a ,qd,qu,y2)- Let DMC C\ cor- 
respond to p a and let DMC C2 correspond to q a and moreover suppose that C2 5 Ci- Let death random variable 
T correspond to px and let death random variable U correspond to qu and moreover suppose that U > s t T. Fix 
a transmission time N and a level of reliability rj. Let V± be the best expected transmission volume for the first 
channel and V2 be the best expected transmission volume for the second channel, under (TV, 77). Then V2 > V\. 

VII. Conclusion and Future Work 

We have formulated the problem of communication over channels that die and have shown how to maximize 
expected transmission volume at a given level of error probability reliability. 

There are several extensions to the basic formulation studied in this work that one might consider; we list a few: 

• Inspired by synthetic biology Q, rather than thinking of death time as independent of the signaling scheme 
Xf, one might consider channels that die because they lose fitness as a consequence of operation: T would 
be dependent on Xf. This would be similar to Gallager's panic button/child's toy channel, and would have 
intersymbol interference 11311 , (341 • There would also be strong connections to channels that heat up IT571 and 
communication with a dynamic cost ||58l Ch. 3]. 

• In the emerging attention economy [59], agents faced with information overload ll60l may permanently stop 
listening to certain communication media received over noisy channels. This setting is exactly modeled by 
channels that die. The impact of communication over channels that die on the productivity and efficiency of 
human organizations may be determined by building on the results herein. 

• Since channel death is indicated by the symbol ?, the receiver unequivocally knows death time. Other channel 
models might not have a distinct output letter for death and would need to detect death, perhaps using the 
theory of estimating stopping times ||6T1 . 

• Inspired by communication terminals that randomly lie within communication range, e.g. in vehicular com- 
munication, one might also consider a channel that is born at a random time and then dies at a random time. 
One would suspect that channel state feedback would be beneficial. Networks of birth-death channels are also 
of interest and would have connections to percolation-style work 0. 

• This work has simply considered the channel coding problem, however there are several formulations of end- 
to-end information transmission problems over channels that die, which are of interest in many application 
areas. There is no reason to suspect a separation principle. 

Randomly stepping back from infinity leads to some new understanding of the fundamental limits of communication 
in the presence of noise and unreliability. 
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